A de-identifier for medical discharge summaries

نویسندگان

  • Özlem Uzuner
  • Tawanda C. Sibanda
  • Yuan Luo
  • Peter Szolovits
چکیده

OBJECTIVE Clinical records contain significant medical information that can be useful to researchers in various disciplines. However, these records also contain personal health information (PHI) whose presence limits the use of the records outside of hospitals. The goal of de-identification is to remove all PHI from clinical records. This is a challenging task because many records contain foreign and misspelled PHI; they also contain PHI that are ambiguous with non-PHI. These complications are compounded by the linguistic characteristics of clinical records. For example, medical discharge summaries, which are studied in this paper, are characterized by fragmented, incomplete utterances and domain-specific language; they cannot be fully processed by tools designed for lay language. METHODS AND RESULTS In this paper, we show that we can de-identify medical discharge summaries using a de-identifier, Stat De-id, based on support vector machines and local context (F-measure=97% on PHI). Our representation of local context aids de-identification even when PHI include out-of-vocabulary words and even when PHI are ambiguous with non-PHI within the same corpus. Comparison of Stat De-id with a rule-based approach shows that local context contributes more to de-identification than dictionaries combined with hand-tailored heuristics (F-measure=85%). Comparison with two well-known named entity recognition (NER) systems, SNoW (F-measure=94%) and IdentiFinder (F-measure=36%), on five representative corpora show that when the language of documents is fragmented, a system with a relatively thorough representation of local context can be a more effective de-identifier than systems that combine (relatively simpler) local context with global context. Comparison with a Conditional Random Field De-identifier (CRFD), which utilizes global context in addition to the local context of Stat De-id, confirms this finding (F-measure=88%) and establishes that strengthening the representation of local context may be more beneficial for de-identification than complementing local with global context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The MITRE Identification Scrubber Toolkit: Design, training, and assessment

PURPOSE Medical records must often be stripped of patient identifiers, or de-identified, before being shared. De-identification by humans is time-consuming, and existing software is limited in its generality. The open source MITRE Identification Scrubber Toolkit (MIST) provides an environment to support rapid tailoring of automated de-identification to different document types, using automatica...

متن کامل

Reducing medication errors in hospital discharge summaries: a randomised controlled trial.

OBJECTIVES To evaluate whether pharmacists completing the medication management plan in the medical discharge summary reduced the rate of medication errors in these summaries. DESIGN Unblinded, cluster randomised, controlled investigation of medication management plans for patients discharged after an inpatient stay in a general medical unit. SETTING The Alfred Hospital, an adult major refe...

متن کامل

Dictated versus database-generated discharge summaries: a randomized clinical trial.

BACKGROUND Hospital discharge summaries communicate information necessary for continuing patient care. They are most commonly generated by voice dictation and are often of poor quality. The objective of this study was to compare discharge summaries created by voice dictation with those generated from a clinical database. METHODS A randomized clinical trial was performed in which discharge sum...

متن کامل

Delirium in hospital: an underreported event at discharge.

OBJECTIVE Delirium, an important event in hospital, is associated with significant mortality and morbidity. Most patients with delirium recover fully; however, when left untreated, delirium may progress to stupor, coma, or death. Delirium is less likely to resolve completely in elderly patients in whom persistent cognitive deficits commonly occur. The extent to which this information is availab...

متن کامل

Research Paper: Electronically Screening Discharge Summaries for Adverse Medical Events

OBJECTIVE Detecting adverse events is pivotal for measuring and improving medical safety, yet current techniques discourage routine screening. The authors hypothesized that discharge summaries would include information on adverse events, and they developed and evaluated an electronic method for screening medical discharge summaries for adverse events. DESIGN A cohort study including 424 rando...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Artificial intelligence in medicine

دوره 42 1  شماره 

صفحات  -

تاریخ انتشار 2008